Rutabaga by any other name: extracting biological names
نویسندگان
چکیده
As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the growing number of biological databases. This article examines emerging techniques to access biological resources through extraction of entity names and relations among them. Information extraction has been an active area of research in natural language processing and there are promising results for information extraction applied to news stories, e.g., balanced precision and recall in the 93-95% range for identifying person, organization and location names. But these results do not seem to transfer directly to biological names, where results remain in the 75-80% range. Multiple factors may be involved, including absence of shared training and test sets for rigorous measures of progress, lack of annotated training data specific to biological tasks, pervasive ambiguity of terms, frequent introduction of new terms, and a mismatch between evaluation tasks as defined for news and real biological problems. We present evidence from a simple lexical matching exercise that illustrates some specific problems encountered when identifying biological names. We conclude by outlining a research agenda to raise performance of named entity tagging to a level where it can be used to perform tasks of biological importance.
منابع مشابه
Extracting Protein Names from Biological Literature
Name entity recognition is an essential task in extracting biological knowledge. In biological corpus, protein names and other terminologies are mixed in natural language sentences. Sometimes whether an abbreviation is a protein name or not depends on the context. Protein names are often composed of gene names, cell names, or even drug names. Moreover, the number of newly coined protein names i...
متن کاملThrone Name in the Achaemenid period
The Achaemenid kings after Darius I elected Darius, Xerxes, and Artaxerxes as their throne name, when they were nominating or substituting for succession. Each of these kings has chosen one of these names according to what happen for they before they reached the king's throne, how to achieve the throne and based on their design and program. These names are not personal and real names, but they ...
متن کاملThe Place-Name as an Intangible Place of Memory (A Holistic Approach in Reading the Place-Names through a Comparative-Analytical Study on the Character of Name and Place)
Understanding architectural heritage and their various aspects have always been a subject of focus for the international conservation communities. Within the recent decades, eventhough the place-names are part of the living history as well as cultural heritage, they have still constantly been facing quick precipitant changes. As such, in the Conservation literature, most studies have skipped ad...
متن کاملNLProt: extracting protein names and sequences from papers
Automatically extracting protein names from the literature and linking these names to the associated entries in sequence databases is becoming increasingly important for annotating biological databases. NLProt is a novel system that combines dictionary- and rule-based filtering with several support vector machines (SVMs) to tag protein names in PubMed abstracts. When considering partially tagge...
متن کاملHow to Pronounce Hebrew Names
This paper addresses the problem of determining the correct pronunciation of people’s names written in Hebrew, by extracting clues from the way the same name is written in other languages, and by using a database of names whose pronunciation is known to guess the correct pronunciation of a given name. Names differs from other words in a language because they do not follow the language’s fixed s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of biomedical informatics
دوره 35 4 شماره
صفحات -
تاریخ انتشار 2002